Local Word Vectors Guide Keyphrase Extraction

نویسندگان

  • Eirini Papagiannopoulou
  • Grigorios Tsoumakas
چکیده

Word vector representation techniques, built on word-word co-occurrence statistics, often provide representations that decode the differences in meaning between various words. This significant fact is a powerful tool that can be exploited to a great deal of natural language processing tasks. In this work, we propose a simple and efficient unsupervised approach for keyphrase extraction, called Reference Vector Algorithm (RVA) which utilizes a local word vector representation by applying the GloVe method in the context of one scientific publication at a time. Then, the mean word vector (reference vector) of the article’s abstract guides the candidate keywords’ selection process, using the cosine similarity. The experimental results that emerged through a thorough evaluation process show that our method outperforms the state-of-the-art methods by providing high quality keyphrases in most cases, proposing in this way an additional mode for the exploitation of GloVe word vectors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Word Vectors Guiding Keyphrase Extraction

Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e. embeddings trained from the single document und...

متن کامل

Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors

Keyphrase extraction from a given document is a difficult task that requires not only local statistical information but also extensive background knowledge. In this paper, we propose a graph-based ranking approach that uses information supplied by word embedding vectors as the background knowledge. We first introduce a weighting scheme that computes informativeness and phraseness scores of word...

متن کامل

273. Task 5. Keyphrase Extraction Based on Core Word Identification and Word Expansion

This paper provides a description of the Hong Kong Polytechnic University (PolyU) System that participated in the task #5 of SemEval-2, i.e., the Automatic Keyphrase Extraction from Scientific Articles task. We followed a novel framework to develop our keyphrase extraction system, motivated by differentiating the roles of the words in a keyphrase. We first identified the core words which are de...

متن کامل

Keyphrase Extraction and Grouping Based on Association Rules

Keyphrases are important in capturing the content of a document and thus useful for many natural language processing tasks such as Information Retrieval, Document Classification, and Text Summarization. Keyphrase extraction aims to identify multi-word sequences from a collection of documents that more or less correspond to keyphrases. In this paper, we propose a new method for keyphrase extract...

متن کامل

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

We introduce a global inference model for keyphrase extraction that reduces overgeneration errors by weighting sets of keyphrase candidates according to their component words. Our model can be applied on top of any supervised or unsupervised word weighting function. Experimental results show a substantial improvement over commonly used word-based ranking approaches.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.07503  شماره 

صفحات  -

تاریخ انتشار 2017